Very Large Lexical Databases: An ACL Tutorial

نویسندگان

  • James Pustejovsky
  • Patrick Hanks
چکیده

The WWW is two orders of magnitude larger than the largest corpora. Although noisy, web textpresents language as it is used, and statistics derived from the Web can have practical uses in many NLPapplications. For this reason, the WWW should be seen and studied as any other computationally availablelinguistic resource. In this article, we illustrate this by showing that an Example−Based approach to lexicalchoice for machine translation can use the Web as an adequate and free resource.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parsing in the Absence of a Complete Lexicon

It is impractical for natural language parsers which serve as front ends to large or changing databases to maintain a complete in-core lexicon of words and meanings. This note discusses a practical approach to using alternative sources of lexical knowledge by postponing word categorization decisions until the parse is complete, and resolving remaining lexical anthiguities usiug a variety of inf...

متن کامل

Deriving Verbal and Compositional Lexical Aspect for NLP Applications

Verbal and compositional lexical aspect provide the underlying temporal structure of events. Knowledge of lexical aspect, e.g., (a)telicity, is therefore required for interpreting event sequences in discourse (Dowty, 1986; Moens and Steedman, 1988; Passoneau, 1988), interfacing to temporal databases (Androutsopoulos, 1996), processing temporal modifiers (Antonisse, 1994), describing allowable a...

متن کامل

Multilingual Lexical Database Generation from Parallel Texts in 20 European Languages with Endogenous Resources

This paper deals with multilingual database generation from parallel corpora. The idea is to contribute to the enrichment of lexical databases for languages with few linguistic resources. Our approach is endogenous: it relies on the raw texts only, it does not require external linguistic resources such as stemmers or taggers. The system produces alignments for the 20 European languages of the ‘...

متن کامل

Aligning WordNet with Additional Lexical Resources

This paper explores the relationship between WordNet and other conventional linguistically-based lexical resources. We introduce an algorithm for aligning word senses from different resources, and use it in our exper~nent to sketch the role played by WordNet, as far as sense discrimination is concerned, when put in the context of other lexical databases. The results show how and where the resou...

متن کامل

Methods for the Qualitative Evaluation of Lexical Association Measures

This paper presents methods for a qualitative, unbiased comparison of lexical association measures and the results we have obtained for adjective-noun pairs and preposition-noun-verb triples extracted from German corpora. In our approach, we compare the entire list of candidates, sorted according to the particular measures, to a reference set of manually identified “true positives”. We also sho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001